Certified Machine Learning Professional Databricks Exam Questions and Answers

Question 1

Which of the following statements describes streaming with Spark as a model deployment strategy?

A. The inference of batch processed records as soon as a trigger is hit
B. The inference of all types of records in real-time
C. The inference of batch processed records as soon as a Spark job is run
D. The inference of incrementally processed records as soon as trigger is hit
E. The inference of incrementally processed records as soon as a Spark job is run

Answer : E

Question 2

A machine learning engineer has deployed a model recommender using MLflow Model Serving. They now want to query the version of that model that is in the Production stage of the MLflow Model Registry.
Which of the following model URIs can be used to query the described model version?

A. https://<databricks-instance>/model-serving/recommender/Production/invocations
B. The version number of the model version in Production is necessary to complete this task.
C. https://<databricks-instance>/model/recommender/stage-production/invocations
D. https://<databricks-instance>/model-serving/recommender/stage-production/invocations
E. https://<databricks-instance>/model/recommender/Production/invocations

Answer : E

Question 3

Which of the following tools can assist in real-time deployments by packaging software with its own application, tools, and libraries?

A. Cloud-based compute
B. None of these tools
C. REST APIs
D. Containers
E. Autoscaling clusters

Answer : A

Question 4

A machine learning engineer has registered a sklearn model in the MLflow Model Registry using the sklearn model flavor with UI model_uri.
Which of the following operations can be used to load the model as an sklearn object for batch deployment?

A. mlflow.spark.load_model(model_uri)
B. mlflow.pyfunc.read_model(model_uri)
C. mlflow.sklearn.read_model(model_uri)
D. mlflow.pyfunc.load_model(model_uri)
E. mlflow.sklearn.load_model(model_uri)

Answer : E

Question 5

A data scientist set up a machine learning pipeline to automatically log a data visualization with each run. They now want to view the visualizations in Databricks.
Which of the following locations in Databricks will show these data visualizations?

A. The MLflow Model Registry Model page
B. The Artifacts section of the MLflow Experiment page
C. Logged data visualizations cannot be viewed in Databricks
D. The Artifacts section of the MLflow Run page
E. The Figures section of the MLflow Run page

Answer : D

Question 6

A data scientist has developed a scikit-learn model sklearn_model and they want to log the model using MLflow.
They write the following incomplete code block:

Which of the following lines of code can be used to fill in the blank so the code block can successfully complete the task?

A. mlflow.spark.track_model(sklearn_model, "model")
B. mlflow.sklearn.log_model(sklearn_model, "model")
C. mlflow.spark.log_model(sklearn_model, "model")
D. mlflow.sklearn.load_model("model")
E. mlflow.sklearn.track_model(sklearn_model, "model")

Answer : B

Question 7

Which of the following describes the concept of MLflow Model flavors?

A. A convention that deployment tools can use to wrap preprocessing logic into a Model
B. A convention that MLflow Model Registry can use to version models
C. A convention that MLflow Experiments can use to organize their Runs by project
D. A convention that deployment tools can use to understand the model
E. A convention that MLflow Model Registry can use to organize its Models by project

Answer : C

Question 8

In a continuous integration, continuous deployment (CI/CD) process for machine learning pipelines, which of the following events commonly triggers the execution of automated testing?

A. The launch of a new cost-efficient SQL endpoint
B. CI/CD pipelines are not needed for machine learning pipelines
C. The arrival of a new feature table in the Feature Store
D. The launch of a new cost-efficient job cluster
E. The arrival of a new model version in the MLflow Model Registry

Answer : D

Question 9

A machine learning engineering team has written predictions computed in a batch job to a Delta table for querying. However, the team has noticed that the querying is running slowly. The team has already tuned the size of the data files. Upon investigating, the team has concluded that the rows meeting the query condition are sparsely located throughout each of the data files.
Based on the scenario, which of the following optimization techniques could speed up the query by colocating similar records while considering values in multiple columns?

A. Z-Ordering
B. Bin-packing
C. Write as a Parquet file
D. Data skipping
E. Tuning the file size

Answer : E

Question 10

A machine learning engineer needs to deliver predictions of a machine learning model in real-time. However, the feature values needed for computing the predictions are available one week before the query time.
Which of the following is a benefit of using a batch serving deployment in this scenario rather than a real-time serving deployment where predictions are computed at query time?

A. Batch serving has built-in capabilities in Databricks Machine Learning
B. There is no advantage to using batch serving deployments over real-time serving deployments
C. Computing predictions in real-time provides more up-to-date results
D. Testing is not possible in real-time serving deployments
E. Querying stored predictions can be faster than computing predictions in real-time

Answer : A

Question 11

A machine learning engineer has developed a random forest model using scikit-learn, logged the model using MLflow as random_forest_model, and stored its run ID in the run_id Python variable. They now want to deploy that model by performing batch inference on a Spark DataFrame spark_df.
Which of the following code blocks can they use to create a function called predict that they can use to complete the task?

A.
B. It is not possible to deploy a scikit-learn model on a Spark DataFrame.
C.
D.
E.

Answer : E

Question 12

Which of the following describes the purpose of the context parameter in the predict method of Python models for MLflow?

A. The context parameter allows the user to specify which version of the registered MLflow Model should be used based on the given application's current scenario
B. The context parameter allows the user to document the performance of a model after it has been deployed
C. The context parameter allows the user to include relevant details of the business case to allow downstream users to understand the purpose of the model
D. The context parameter allows the user to provide the model with completely custom if-else logic for the given application's current scenario
E. The context parameter allows the user to provide the model access to objects like preprocessing models or custom configuration files

Answer : E

Question 13

A machine learning engineer has developed a model and registered it using the FeatureStoreClient fs. The model has model URI model_uri. The engineer now needs to perform batch inference on customer-level Spark DataFrame spark_df, but it is missing a few of the static features that were used when training the model. The customer_id column is the primary key of spark_df and the training set used when training and logging the model.
Which of the following code blocks can be used to compute predictions for spark_df when the missing feature values can be found in the Feature Store by searching for features by customer_id?

A. df = fs.get_missing_features(spark_df, model_uri)
fs.score_model(model_uri, df)
B. fs.score_model(model_uri, spark_df)
C. df = fs.get_missing_features(spark_df, model_uri)
fs.score_batch(model_uri, df)
D. df = fs.get_missing_features(spark_df)
fs.score_batch(model_uri, df)
E. fs.score_batch(model_uri, spark_df)

Answer : E

Question 14

A machine learning engineer needs to select a deployment strategy for a new machine learning application. The feature values are not available until the time of delivery, and results are needed exceedingly fast for one record at a time.
Which of the following deployment strategies can be used to meet these requirements?

A. Edge/on-device
B. Streaming
C. None of these strategies will meet the requirements.
D. Batch
E. Real-time

Answer : E

Question 15

A machine learning engineer is using the following code block as part of a batch deployment pipeline:

Which of the following changes needs to be made so this code block will work when the inference table is a stream source?

A. Replace "inference" with the path to the location of the Delta table
B. Replace schema(schema) with option("maxFilesPerTrigger", 1)
C. Replace spark.read with spark.readStream
D. Replace format("delta") with format("stream")
E. Replace predict with a stream-friendly prediction function

Answer : C

Certified Machine Learning Professional v1.0

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Question 15

Talk to us!